Recent works on bounding the output size of a conjunctive query withfunctional dependencies and degree constraints have shown a deep connectionbetween fundamental questions in information theory and database theory. Weprove analogous output bounds for disjunctive datalog rules, and answer severalopen questions regarding the tightness and looseness of these bounds along theway. Our bounds are intimately related to Shannon-type informationinequalities. We devise the notion of a "proof sequence" of a specific class ofShannon-type information inequalities called "Shannon flow inequalities". Wethen show how such a proof sequence can be interpreted as symbolic instructionsguiding an algorithm called "PANDA", which answers disjunctive datalog ruleswithin the time that the size bound predicted. We show that PANDA can be usedas a black-box to devise algorithms matching precisely the fractional hypertreewidth and the submodular width runtimes for aggregate and conjunctive querieswith functional dependencies and degree constraints. Our results improve upon known results in three ways. First, our bounds andalgorithms are for the much more general class of disjunctive datalog rules, ofwhich conjunctive queries are a special case. Second, the runtime of PANDAmatches precisely the submodular width bound, while the previous algorithm byMarx has a runtime that is polynomial in this bound. Third, our bounds andalgorithms work for queries with input cardinality bounds, functionaldependencies, and degree constraints. Overall, our results show a deep connection between three seemingly unrelatedlines of research; and, our results on proof sequences for Shannon flowinequalities might be of independent interest.
展开▼